Multi-document summarization using distortion-rate ratio

نویسندگان

  • Ulukbek Attokurov
  • Ulug Bayazit
چکیده

The current work adapts the optimal tree pruning algorithm(BFOS) introduced by Breiman et al.(1984) and extended by Chou et al.(1989) to the multi-document summarization task. BFOS algorithm is used to eliminate redundancy which is one of the main issues in multi-document summarization. Hierarchical Agglomerative Clustering algorithm(HAC) is employed to detect the redundancy. The tree designed by HAC algorithm is successively pruned with the optimal tree pruning algorithm to optimize the distortion vs. rate cost of the resultant tree. Rate parameter is defined to be the number of the sentences in the leaves of the tree. Distortion is the sum of the distances between the representative sentence of the cluster at each node and the other sentences in the same cluster. The sentences assigned to the leaves of the resultant tree are included in the summary. The performance of the proposed system assessed with the Rouge-1 metric is seen to be better than the performance of the DUC-2002 winners on DUC-2002 data set.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A survey on Automatic Text Summarization

Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...

متن کامل

Multi-Document Summarization Using Graph-Based Iterative Ranking Algorithms and Information Theoretical Distortion Measures

Text summarization is an important field in the area of natural language processing and text mining. This paper proposes an extraction-based model which uses graphbased and information theoretic concepts for multidocument summarization. Our method constructs a directed weighted graph from the original text by adding a vertex for each sentence, and compute a weighted edge between sentences which...

متن کامل

Extraction Based Multi Document Summarization using Single Document Summary Cluster

Multi document summarization has very great impact among research community, ever since the growth of online information and availability. Selecting most important sentences from such huge repository of data is quiet tricky and challenging task. While multi document poses some additional overhead in sentence selection, generating summaries for each individual documents and merging the sentences...

متن کامل

Measuring Importance and Query Relevance in Topic-focused Multi-document Summarization

The increasing complexity of summarization systems makes it difficult to analyze exactly which modules make a difference in performance. We carried out a principled comparison between the two most commonly used schemes for assigning importance to words in the context of query focused multi-document summarization: raw frequency (word probability) and log-likelihood ratio. We demonstrate that the...

متن کامل

Measuring Importance and Query Relevance in Toopic-Focused Multi-Document Summarization

The increasing complexity of summarization systems makes it difficult to analyze exactly which modules make a difference in performance. We carried out a principled comparison between the two most commonly used schemes for assigning importance to words in the context of query focused multi-document summarization: raw frequency (word probability) and log-likelihood ratio. We demonstrate that the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014